Massive-Scale RDF Processing Using Compressed Bitmap Indexes
نویسندگان
چکیده
The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scientific data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern-finding queries on this implicit multigraph in a SQLlike syntax. SPARQL queries generate complex intermediate join queries; to compute these joins efficiently, we propose a new strategy based on bitmap indexes. We store the RDF data in column-oriented structures as compressed bitmaps along with two dictionaries. This paper makes three new contributions. (i) We present an efficient parallel strategy for parsing the raw RDF data, building dictionaries of unique entities, and creating compressed bitmap indexes of the data. (ii) We utilize the constructed bitmap indexes to efficiently answer SPARQL queries, simplifying the join evaluations. (iii) To quantify the performance impact of using bitmap indexes, we compare our approach to the state-of-the-art triple-store RDF-3X. We find that our bitmap index-based approach to answering queries is up to an order of magnitude faster for a variety of SPARQL queries, on gigascale RDF data sets. Keywordssemantic data, RDF, SPARQL query optimization, compressed bitmap indexes, largescale data analysis
منابع مشابه
SBH: Super byte-aligned hybrid bitmap compression
Bitmap indexes are commonly used in data warehousing applications such as on-line analytic processing (OLAP). Storing the bitmaps in compressed form has been shown to be effective not only for low cardinality attributes, as conventional wisdom would suggest, but also for high cardinality attributes. Compressed bitmap indexes, such as Byte-aligned been shown to be efficient in terms of both time...
متن کاملPerformance of Multi-Level and Multi-Component Compressed Bitmap Indexes
Bitmap indexes are known as the most effective indexing methods for range queries on append-only data, especially for low cardinality attributes. Recently, bitmap indexes were also shown to be just as effective for high cardinality attributes when certain compression methods are applied. There are many different bitmap indexes in the literature but no definite comparison among them has been mad...
متن کاملCompressing Bitmap Indexes for Faster Search Operations
In this paper, we study the effects of compression on bitmap indexes. Themain operations on the bitmaps during query processing are bitwise logical operations such as AND,OR,NOT, etc.Using the general purpose compression schemes, such as gzip, the logical operations on the compressed bitmaps are much slower than on the uncompressed bitmaps. Specialized compression schemes, like the byte-aligned...
متن کاملCompressed Spatial Hierarchical Bitmap (cSHB) Indexes for Efficiently Processing Spatial Range Query Workloads
In most spatial data management applications, objects are represented in terms of their coordinates in a 2-dimensional space and search queries in this space are processed using spatial index structures. On the other hand, bitmap-based indexing, especially thanks to the compression opportunities bitmaps provide, has been shown to be highly effective for query processing workloads including sele...
متن کاملBetter bitmap performance with Roaring bitmaps
Bitmap indexes are commonly used in databases and search engines. By exploiting bit-level parallelism, they can significantly accelerate queries. However, they can use much memory, and thus we might prefer compressed bitmap indexes. Following Oracle’s lead, bitmaps are often compressed using run-length encoding (RLE). Building on prior work, we introduce the Roaring compressed bitmap format: it...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011